A Measure of Speech and Pitch Reliability from Voicing
نویسنده
چکیده
We propose a CASA labelling method of the TF representation, which is based on the periodicity of the speech, related to the voicing. A local voicing index is estimated in four subbands after demodulation of the signal. This index is used as a reliability measure for both pitch identification and speech recognition. First, this model allows robust f0 identification thanks to the voicing index, which is a consistent reliability measure associated to the f0 measure. Since the task is to recognise speech corrupted with additive noise, the periodicity is specific to the signal. In our model, the evaluation of "speech reliability" is not direct. It also depends on a priori knowledge about the relationship between SNR and the voicing index. The goal is to assign to each TF region a probability "clean" enough to feed a multistream recogniser only adapted to clean data. The model is able to localise regions where the target speech dominates over the background noise. This probability is evaluated according to a function established a priori, the SNR-feature mapping, and the choice of a SNR decision threshold. This is adapted to a new multistream recognition approach [7], since the previous probabilities serve to weight the streams' posteriors.
منابع مشابه
Multi-band summary correlogram-based pitch detection for noisy speech
A multi-band summary correlogram (MBSC)-based pitch detection algorithm (PDA) is proposed. The PDA performs pitch estimation and voiced/unvoiced (V/UV) detection via novel signal processing schemes that are designed to enhance the MBSC’s peaks at the most likely pitch period. These peak-enhancement schemes include comb-filter channel-weighting to yield each individual subband’s summary correlog...
متن کاملA Pre-processing Method to Modify Irregular Pitch Variations for Quality Enhancement of Synthesised Speech
In low bit rate speech coders, pitch and voicing level estimation play an important role in quality of the synthesised speech. Although pitch usually evolves smoothly, sometimes it has irregular variations and as a result the estimated pitch and the voicing level differ from the real ones. This affects the performance of the speech coder. We propose to use a new modification as a preprocessor. ...
متن کاملEnhancement of esophageal speech using formant synthesis
The feasibility of using the formant analysis-synthesis approach to replace the voicing sources of esophageal speech was explored. The voicing sources were generated by using inverse-filtered signals extracted from normal speakers. Pitch extraction was tested with various pitch extraction methods, then simple auto-correlation method was chosen. Special hardware unit was designed to perform the ...
متن کاملبررسی تأثیر دیرش نمونه گفتار بر زیروبمی عادتی در زنان طبیعی 18 تا 30 ساله
Introduction: habitual pitch perception associated with the mean fundamental frequency of speech. In the clinical evaluation referred to this issue is dealt with in the normal range for a person whether he is a habitual pitch. A common feature in many of the abnormal pitch of voice disorders, the assessment of habitual pitch and factors affecting it, may help scientists to determine the exist...
متن کاملMAP prediction of pitch from MFCC vectors for speech reconstruction
This work proposes a method of predicting pitch and voicing from mel-frequency cepstral coefficient (MFCC) vectors. Two maximum a posteriori (MAP) methods are considered. The first models the joint distribution of the MFCC vector and pitch using a Gaussian mixture model (GMM) while the second method also models the temporal correlation of the pitch contour using a combined hidden Markov model (...
متن کامل